-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter_lua: Add chunk mode for processing multiple records #8478
Conversation
Documentation for fluent/fluent-bit#8478 Signed-off-by: Richard Treu <richard.treu@sap.com>
647db35
to
8beca2c
Compare
8beca2c
to
ce873bb
Compare
Documentation for fluent/fluent-bit#8478 Signed-off-by: Richard Treu <richard.treu@sap.com>
ce873bb
to
a956e55
Compare
a956e55
to
a5a569f
Compare
This commit will introduce a chunk_mode for lua filter. It can be needed for use cases like parallelization (see lua lanes). Please note that the lua functions will take only two arguments: function process_records(tag, records) if records and type(records) == "table" then for i, record_row in ipairs(records) do local timestamp = record_row.timestamp local record = record_row.record print("Timestamp entry:", timestamp.sec, timestamp.nsec) print("Record entry:", record.message) end else print("Error: Invalid 'records' table or nil") end return records end The returned table must be in the same format (table of timestamp and record pairs). This mode currently only supports time_as_table by default and does always emit the returned records. There is no return code to be set. Signed-off-by: Richard Treu <richard.treu@sap.com>
a5a569f
to
46b4f59
Compare
@tarruda would you take a look? |
@drbugfinder-work Thanks for this PR. I skimmed quickly through the PR and lua lanes documentation in order to understand the end goal, and I assume this is how everything will work:
(please let me know if I missed something): With that in mind, I have some questions:
The reason these questions are important is that this approach might have a lot of overhead and this adds quite a lot of complexity to the lua_filter implementation. Spawning threads is expensive, and only worth if you have a huge amount of data. If fluent-bit can never pass more than a few hundred records to the lua callback, I suspect splitting the workload across thread will make things even slower (this is why I suggest benchmarking). |
Hi @tarruda, you're absolutely right with that summary. The main goal is to parallelize record processing in Lua. However, there might be other use cases that could make use of processing more than one record at once within Lua (e.g. trend analysis). I'm testing this in different environments, especially in K8s clusters with a couple hundreds instances of Fluent Bit. In high load scenarios, I can see chunks of up to around 300-500 records coming into the Lua filter per call. It fluctuates between 1 to 50 records per filter call under normal circumstances, depending on the log volume at the pipeline and Flush settings. Please keep in mind, that this change to the plugin does not make a direct use of the Lua Lanes library - that is just one use case. The library has to be installed by the user, in case it should be used. The implementation on how to distribute the records within the Lua into different worker threads is up to the user. I've added a simple (non worker thread limited) example to the documentation to show how to create a thread for every incoming record (this would of course create a thread on the OS for every record - which might or might not be performing well). The user could use the Linda objects of the Lanes lib to share data between threads, if this is needed by the user. I don't think that this adds much complexity to the lua filter plugin itself, as it just passes a table of records instead of a single record to Lua - but it gives you the option to make use of all the available data in that chunk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drbugfinder-work Since you are seeing an increase in throughput and this change is backwards compatible, LGTM
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
Over the last months we noticed a memory leak (not detected by valgrind) by this change. |
This PR will introduce a chunk_mode for lua filter. It can be needed for use cases like parallelization (see lua lanes).
Please note that the lua functions will take only two arguments:
It's configuration looks like this:
The returned table must be in the same format (table of timestamp and record pairs).
This mode currently only supports time_as_table by default and does always emit the returned records. There is no return code to be set.
A use case for this can be the parallel execution of lua filters by using the lua lanes library.
Please see example here (remember to install lua lanes first e.g.
apt install luarocks && luarocks install lanes
and check the path in the lanes_example.lua)Please see valgrind output:
Documentation PR:
fluent/fluent-bit-docs#1310
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.